A Unified Runtime System for Heterogeneous Multi-core Architectures

نویسندگان

  • Cédric Augonnet
  • Raymond Namyst
چکیده

Approaching the theoretical performance of heterogeneous multicore architectures, equipped with specialized accelerators, is a challenging issue. Unlike regular CPUs that can transparently access the whole global memory address range, accelerators usually embed local memory on which they perform all their computations using a specific instruction set. While many research efforts have been devoted to offloading parts of a program over such coprocessors, the real challenge is to find a programming model providing a unified view of all available computing units. In this paper, we present an original runtime system providing a highlevel, unified execution model allowing seamless execution of tasks over the underlying heterogeneous hardware. The runtime is based on a hierarchical memory management facility and on a codelet scheduler. We demonstrate the efficiency of our solution with a LU decomposition for both homogeneous (3.8 speedup on 4 cores) and heterogeneous machines (95 % efficiency). We also show that a “granularity aware” scheduling can improve execution time by 35 %.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using program behaviour to exploit heterogeneous multi-core processors

Multi-core CPU architectures have become prevalent in recent years. A number of multi-core CPUs consist of not only multiple processing cores, but multiple different types of processing cores, each with different capabilities and specialisations. These heterogeneous multi-core architectures (HMAs) can deliver exceptional performance; however, they are notoriously difficult to program effectivel...

متن کامل

Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System

Core specialization is currently one of the most promising ways for designing power-efficient multicore chips. However, approaching the theoretical peak performance of such heterogeneous multicore architectures with specialized accelerators, is a complex issue. While substantial effort has been devoted to efficiently offloading parts of the computation, designing an execution model that unifies...

متن کامل

Targeting the Parallella

Heterogeneous computing involves the combined use of processing elements with different architectures and is widely considered a prerequisite in the quest for higher performance and lower power consumption. To support this trend, the OpenMP standard has been recently augmented with directives that target systems consisting of general-purpose hosts and accelerator devices that may execute portio...

متن کامل

Towards a multifrontal QR factorization for heterogeneous architectures over runtime systems

During the last decade, computer architectures for high performance computing have considerably evolved toward heterogeneous systems equipped with different types of computational units and a higher number of cores per chips. An example of popular heterogeneous architectures widely adopted in the high performance computing domain are GPU-based systems. In the work presented in this talk we stud...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008